{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4c89cfe",
   "metadata": {},
   "outputs": [],
   "source": [
    "情感分类的步骤：\n",
    "1、数据预处理\n",
    "2、分词\n",
    "3、提取特征\n",
    "4、训练\n",
    "5、测试\n",
    "6、分析数据"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "554f277d",
   "metadata": {},
   "source": [
    "# 体验"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "dbde4d6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from snownlp import SnowNLP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c8324417",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"这个菜有点难吃。谁也不想吃。\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "92d8c4c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = SnowNLP(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8c5217a7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.19434392895747854"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.sentiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "229282fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "t2 = \"这首歌真难听\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "8f9b17ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = SnowNLP(t2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "487d5eb7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7939342345098661"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.sentiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8707271b",
   "metadata": {},
   "outputs": [],
   "source": [
    "t2 = \"今天天气真好啊\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e094b4a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = SnowNLP(t2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "936748a1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.46129045755346965"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.sentiments"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0b439dd",
   "metadata": {},
   "source": [
    "# 训练测试"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a78cca91",
   "metadata": {},
   "source": [
    "准备消极和积极情感的文本，分别为neg.txt和pos.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1750ef70",
   "metadata": {},
   "source": [
    "## 训练 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "17ba4721",
   "metadata": {},
   "outputs": [],
   "source": [
    "from snownlp import sentiment\n",
    "sentiment.train('neg.txt', 'pos.txt')\n",
    "sentiment.save(\"sentiment.marshal\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "862bf22c",
   "metadata": {},
   "source": [
    "将训练好的模型sentiment.marshal放到软件snownlp工具包安装路径下"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8850f098",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.11911524242112959"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "t2 = \"这首歌真难听\"\n",
    "s = SnowNLP(t2)\n",
    "s.sentiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "7001e73b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7255412246747047"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "t2 = \"今天天气真好啊\"\n",
    "s = SnowNLP(t2)\n",
    "s.sentiments"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "575d6496",
   "metadata": {},
   "source": [
    "## 测试"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "58f440c2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>comment</th>\n",
       "      <th>sentiment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>服务太差，一个菜半小时上不来，最后服务员说弄错了，无语啊……</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10月4日，我们3人去吃的，到家后都拉肚子，还恶心，以后不会再去了。</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>蛋糕好干，颗粒好粗，38元极度不值，第戎完爆，饮品真的只喝了几口，就全是冰了，我好渴好不好，...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>如果能打负分的话一定给他打负分，可惜没有！跟朋友聚会去的这家自助、大周五的差不多8点应该也不...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>看了这么多的好评如潮，我都感觉自己去的跟你们去的并不是一个店了。我们是五个人中午过去吃的。我...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             comment  sentiment\n",
       "0                     服务太差，一个菜半小时上不来，最后服务员说弄错了，无语啊……          0\n",
       "1                 10月4日，我们3人去吃的，到家后都拉肚子，还恶心，以后不会再去了。          0\n",
       "2  蛋糕好干，颗粒好粗，38元极度不值，第戎完爆，饮品真的只喝了几口，就全是冰了，我好渴好不好，...          0\n",
       "3  如果能打负分的话一定给他打负分，可惜没有！跟朋友聚会去的这家自助、大周五的差不多8点应该也不...          0\n",
       "4  看了这么多的好评如潮，我都感觉自己去的跟你们去的并不是一个店了。我们是五个人中午过去吃的。我...          0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#准确率\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "data_test = pd.read_excel(\"test.xlsx\")\n",
    "data_test.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "a9401e90",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "准确率为： 0.7852112676056338\n"
     ]
    }
   ],
   "source": [
    "s = []\n",
    "for c in data_test['comment']:\n",
    "    score = SnowNLP(c).sentiments\n",
    "    if score>=0.5:\n",
    "        s.append(1)\n",
    "    else:\n",
    "        s.append(0)\n",
    "count = np.sum((s == data_test['sentiment'])==1)\n",
    "print('准确率为：',count/len(data_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b73ef716",
   "metadata": {},
   "source": [
    "# 分析数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "0c3690ad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>comment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>菜品口味还行，上菜太慢，一杯鲜榨果汁居然四十分钟才上。薯条非常好吃，是我吃过最好吃的</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>铁板蛏子都是沙子，酱爆海兔超咸，海参捞饭里的海参硬硬的。评论都是骗人的……</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>重盐??油大??号称东北菜??东北菜什么时候这么油了？唯一的一个素菜??那个油啊??唉茄盒创...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>有点流于表面??菜相当一般??甚至有点难吃??别的都OK??但是饭馆最重要的还是菜啊??很坑</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>太差了,见过次的没见过这么次的。首先说稍显不次的，平心而论他的口味实在是太对不起它的价格了价...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             comment\n",
       "0         菜品口味还行，上菜太慢，一杯鲜榨果汁居然四十分钟才上。薯条非常好吃，是我吃过最好吃的\n",
       "1              铁板蛏子都是沙子，酱爆海兔超咸，海参捞饭里的海参硬硬的。评论都是骗人的……\n",
       "2  重盐??油大??号称东北菜??东北菜什么时候这么油了？唯一的一个素菜??那个油啊??唉茄盒创...\n",
       "3     有点流于表面??菜相当一般??甚至有点难吃??别的都OK??但是饭馆最重要的还是菜啊??很坑\n",
       "4  太差了,见过次的没见过这么次的。首先说稍显不次的，平心而论他的口味实在是太对不起它的价格了价..."
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_excel(\"data.xlsx\")\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "35f0e80c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>comment</th>\n",
       "      <th>sentiment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>菜品口味还行，上菜太慢，一杯鲜榨果汁居然四十分钟才上。薯条非常好吃，是我吃过最好吃的</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>铁板蛏子都是沙子，酱爆海兔超咸，海参捞饭里的海参硬硬的。评论都是骗人的……</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>重盐??油大??号称东北菜??东北菜什么时候这么油了？唯一的一个素菜??那个油啊??唉茄盒创...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>有点流于表面??菜相当一般??甚至有点难吃??别的都OK??但是饭馆最重要的还是菜啊??很坑</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>太差了,见过次的没见过这么次的。首先说稍显不次的，平心而论他的口味实在是太对不起它的价格了价...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             comment  sentiment\n",
       "0         菜品口味还行，上菜太慢，一杯鲜榨果汁居然四十分钟才上。薯条非常好吃，是我吃过最好吃的          0\n",
       "1              铁板蛏子都是沙子，酱爆海兔超咸，海参捞饭里的海参硬硬的。评论都是骗人的……          0\n",
       "2  重盐??油大??号称东北菜??东北菜什么时候这么油了？唯一的一个素菜??那个油啊??唉茄盒创...          0\n",
       "3     有点流于表面??菜相当一般??甚至有点难吃??别的都OK??但是饭馆最重要的还是菜啊??很坑          0\n",
       "4  太差了,见过次的没见过这么次的。首先说稍显不次的，平心而论他的口味实在是太对不起它的价格了价...          0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = []\n",
    "for c in data['comment']:\n",
    "    score = SnowNLP(c).sentiments\n",
    "    if score>=0.5:\n",
    "        s.append(1)\n",
    "    else:\n",
    "        s.append(0)\n",
    "data['sentiment'] = s\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "5c0e97f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.to_excel(\"data_result.xlsx\",index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8958f28",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
